8 research outputs found

    An Energy Reduction Scheduling Mechanism for a High-Performance SoC Architecture

    Get PDF
    Abstract. Continuous improvements in semiconductor technology are supporting new classes of System-on-a-Chip (SoC) architectures that combine extensive processing logic with high-density memory. Such architectures are generally called Processor-in-Memory (PIM) or Intelligent Memory (I-RAM) and can support high-performance computing by reducing the performance gap between the processor and the memory. The PIM architecture combines various processors in a single chip. These processors are characterized by their computation, memory-access and power consumption capabilities. Therefore, a novel parallelizing system, SAGE II, has been developed to identify their capabilities and dispatch the most appropriate jobs to them in order to exploit the advantages of PIM architectures. However, the SAGE II system only can deal with performance issues but power consumption is gradually becoming an important issue of current computing systems. This paper provides a new lowpower transformation mechanism, called Energy-Oriented Power Reduction Scheduling (EOPRS), to extend the capability of SAGE II system. It can reduce the power consumption for the Processor-in-Memory system without losing execution performance. The detailed EOPRS transformation technique is presented later. The experimental results of several benchmarks are also discussed

    Golden-Finger and Back-Door: Two HW/SW Mechanisms for Accelerating Multicore Computer Systems

    Get PDF
    Continuously requirements of high-performance computing make the computer system adopt more processorswithin a system to improve the parallelism and throughput. Although multiple processing cores are implemented ina computer system, the complicated hardware communication mechanism between processors will decrease theperformance of overall system. Besides, the unsuitable process scheduling mechanism of conventional operatingsystem can not fully utilize the computation power of additional processors. Accordingly, this paper provides twomechanisms to overcome the above challenges by using hardware and software mechanisms, respectively. Insoftware aspect, we propose a tool, called Golden-Finger, to dynamically adjust the scheduling policy of the processscheduler in Linux. This software mechanism can improve the performance of the specified process by occupying aprocessor solely. In hardware aspect, we design an effective hardware mechanism, called Back-Door, tocommunicate two independent processors which can not be operated together, such as the dual PowerPC 405 coresin the Xilinx ML310 system. The experimental results reveal that the two mechanisms can obtain significantperformance enhancements

    Methods for Optimizing OpenCL Applications on Heterogeneous Multicore Architectures

    No full text
    Abstract: Heterogeneous multicore architectures with CPU and add-on GPUs or streaming processors are now widely used in computer systems. These GPUs provide substantially more computation capability and memory bandwidth compared to traditional multi-cores. Also, because they are highly programmable, they provide the computational performance needed for realistic graphics rendering. Applications with general computations can also be leveraged onto these GPUs. This study discusses the architectures of these highly efficient GPUs and applies a unified programming standard called OpenCL to fully utilize their capabilities. Despite their great potential, applications of these GPUs are challenging because of their diverse underlying architectural characteristics. In this study, several optimizing techniques are applied on OpenCL-compatible heterogeneous multicore architectures to achieve thread-level and data-level parallelisms. The architectural implications of these techniques are discussed. Finally, optimization principles for these architectures will be are proposed. The experimental reveal average speedups of 24 and 430 for non-optimized and optimized kernels, respectively

    Novel Memory Access Scheduling Algorithms for a Surveillance System

    No full text
    The continuously growing functionality of digital video surveillance make the surveillance system integrate more streaming processors for serving more cameras to recoding their raw video streaming data. But the memory subsystem can not provide necessary bandwidth and become the bottleneck of whole system. Therein how to improve the performance of the accessing memory will become a major challenge of designing a modern surveillance system. This study proposes novel memory accessing scheduling algorithms, with a corresponding memory controller, called Self-Adjustable Memory System (SAMS), for a multiple-channel streaming systemon- a-chip. By integrating Access Buffers, Frontend Scheduler, Reorder Block, Backend Scheduler, and two scheduling algorithms, SAMS can provide a sufficient memory bandwidth for the streaming processors with high bandwidth requirements. The utilization of multiple DRAM banks can be improved accordingly. The experimental results illustrate that SAMS will arrange enough bandwidth for the streaming processors that have bursting transferring requirement. The enhanced speedup can achieve 3.9X than conventional memory subsystem

    Golden-Finger and Back-Door: Two HW/SW Mechanisms for Accelerating Multicore Computer Systems

    No full text
    Continuously requirements of high-performance computing make the computer system adopt more processors within a system to improve the parallelism and throughput. Although multiple processing cores are implemented in a computer system, the complicated hardware communication mechanism between processors will decrease the performance of overall system. Besides, the unsuitable process scheduling mechanism of conventional operating system can not fully utilize the computation power of additional processors. Accordingly, this paper provides two mechanisms to overcome the above challenges by using hardware and software mechanisms, respectively. In software aspect, we propose a tool, called Golden-Finger, to dynamically adjust the scheduling policy of the process scheduler in Linux. This software mechanism can improve the performance of the specified process by occupying a processor solely. In hardware aspect, we design an effective hardware mechanism, called Back-Door, to communicate two independent processors which can not be operated together, such as the dual PowerPC 405 cores in the Xilinx ML310 system. The experimental results reveal that the two mechanisms can obtain significant performance enhancements

    CPPM: a Comprehensive Power-aware Processor Manager for a Multicore System

    No full text
    The growing functionality of mobile devices explains increasing system performance requirements and the subsequent wide adoption of multicore processors. As mobile systems are battery powered, battery life largely limits these high performing multicore mobile devices. Developing an efficient power-aware processor manager for mobile multicore systems has received considerable attention. The conventional processor management system of embedded systems e.g., the Linux kernel scheduler incorporates an automatic scheme to control peripheral operations and processor frequency. However, this mechanism fails to consider user requirements, task loading, and operating status of processors in the multicore system to satisfy operating requirements. Therefore, this work presents a novel power-aware multicore processor manager, referred to herein as a comprehensive power-aware processor manager (CPPM), which integrates a system configuration selection algorithm (BPM-DFS), task re-scheduling mechanism (CTM), and precise system power estimation mechanism (PPM). The CPPM manager can dynamically set system configurations and rearrange executed tasks among multiple cores to comply with the limitation of power consumption that is assigned by the user. Moreover, the proposed CPPM is implemented on quad-core x86 Android system to compare with the capabilities of other scheduling mechanisms
    corecore